AMENDMENTS TO THE CLAIMS 

1 . (Currently Amended) A language model generation and accumulation apparatus 
that generates and accumulates language models for speech recognition, the apparatus 
comprising: 

a higher- level N-gram language model generation and accumulation unit operable to generate 
and accumulate a higher-lever N-gram language model that is obtained by modeling each of a 
plurality of texts as a sequence of words that includes a word string class indicating a linguistic 
property of a word string constituting two or more words; and 

a lower- level N-gram language model generation and accumulation unit operable to generate 
and accumulate a lower-level N-gram language model that is obtained by modeling a f4rstsequence 
of two or more words within the word string class having a specific linguistic property; and 

a higher - level N - gram language model generation and accumulation unit operable to generate 
and accumulate a higher lever N gram language model that is obtained by modeling the first 
sequence of words modeled in the lowor - lovol N - gram language model as a word string class and a 
plurality of t e xt as a s e cond sequ e nc e of words that includes the word string class . 

2. (Original) The language model generation and accumulation apparatus according to 
Claim 1, 

wherein the higher-level N-gram language model generation and accumulation unit and 
the lower-level N-gram language model generation and accumulation unit generate the 
respective language models, using different corpuses. 

3. (Original) The language model generation and accumulation apparatus according to 
Claim 2, 

wherein the lower-level N-gram language model generation and accumulation unit 
includes 

a corpus update unit operable to update the corpus for the lower-level N-gram language 
model, and 
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the lower-level N-gram language model generation and accumulation unit updates the 
lower-level N-gram language model based on the updated corpus, and generates the updated 
lower-level N-gram language model. 

4. (Currently Amended) The language model generation and accumulation 
apparatus according to Claim 1, 

wherein the lower-level N-gram language model generation and accumulation unit 
analyzes the-a_first sequence of words within the word string class into one or more morphemes 
that are the smallest language units having meanings, and generates the lower-level N-gram 
language model by modeling each sequence of the one or more morphemes based on the word 
string class. 

5. (Previously Presented) The language model generation and accumulation 
apparatus according to Claim 1, 

wherein the higher-level N-gram language model generation and accumulation unit 
substitutes the word string class with a virtual word, and then generates the higher-level N-gram 
language model by modeling a sequence made up of the virtual word and other words, the word 
string class being included in each of the plurality of texts analyzed into morphemes. 

6. (Previously Presented) The language model generation and accumulation 
apparatus according to Claim 1, 

wherein the lower-level N-gram language model generation and accumulation unit 
includes 

an exception word judgment unit operable to judge whether or not a specific word out of 
a plurality of words that appear in the word string class should be treated as an exception word, 
based on a linguistic property of the specific word, and divides the exception word into (i) a 
syllable that is a basic phonetic unit constituting a pronunciation of the exception word and (ii) a 
unit that is obtained by combining syllables based on a judgment result, the exception word 
being a word not being included as a constituent word of the word string class, and 
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the language model generation and accumulation apparatus further comprises 
a class dependent syllable N-gram generation and accumulation unit operable to generate 
class dependent syllable N-grams by modeling a sequence made up of the syllable and the unit 
obtained by combining syllables and by providing a language likelihood to the sequence in 
dependency on either the word string class or the linguistic property of the exception word, and 
accumulate the generated class dependent syllable N-grams, the language likelihood being a 
logarithm value of a probability 

7. (Previously Presented) The language model generation and accumulation 
apparatus according to Claim 1, further comprising 

a syntactic tree generation unit operable to perform morphemic analysis as well as 
syntactic analysis of a text, and generate a syntactic tree in which the text is structured by a 
plurality of layers, focusing on a node that is on the syntactic tree and that has been selected on 
the basis of a predetermined criterion, 

wherein the higher-level N-gram language model generation and accumulation unit 
generates the higher-level N-gram language model for syntactic tree, using a first subtree that 
constitutes an upper layer from the focused node, and 

the lower-level N-gram language model generation and accumulation unit generates the 
lower-level N-gram language model for syntactic tree, using a second subtree that constitutes a 
lower layer from the focused node. 

8. (Previously Presented) The language model generation and accumulation 
apparatus according to Claim 7, 

wherein the lower-level N-gram language model generation and accumulation unit 
includes 

a language model generation exception word judgment unit operable to judge a specific 
word appearing in the second subtree as an exception word based on a predetermined linguistic 
property, the exception word being a word not being included as a constituent word of any 
subtree, and 
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the lower-level N-gram language model generation and accumulation unit generates the 
lower-level N-gram language model by dividing the exception word into (i) a syllable that is a 
basic phonetic unit constituting a pronunciation of the word and (ii) a unit that is obtained by 
combining syllables, and then by modeling a sequence made up of the syllable and the unit 
obtained by combining syllables in dependency on a location of the exception word in the 
syntactic tree and on the linguistic property of the exception word. 

9. (Previously Presented) The language model generation and accumulation 
apparatus according to Claim 1, further comprising 

a syntactic tree generation unit operable to perform morphemic analysis as well as 
syntactic analysis of a text, and generate a syntactic tree in which the text is structured by a 
plurality of layers, focusing on a node that is on the syntactic tree and that has been selected 
based on a predetermined criterion, 

wherein the higher-level N-gram language model generation and accumulation unit 
generates the higher-level N-gram language model, using a first subtree that constitutes a highest 
layer of the syntactic tree, and 

the lower-level N-gram language model generation and accumulation unit categorizes 
each subtree constituting a layer lower than a second layer based on a positioning of each 
subtree when included in the upper layer, and generates the lower-level N-gram language model 
by use of each of the categorized subtrees. 

10. (Previously Presented) The language model generation and accumulation 
apparatus according to Claim 9, 

wherein the lower-level N-gram language model generation and accumulation unit 
includes 

a language model generation exception word judgment unit operable to judge, as an 
exception word, a specific word appearing in any subtree in a layer lower than the second layer 
based on a predetermined linguistic property, the exception word being a word not being 
included as a constituent word of any subtree, and 
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the lower-level N-gram language model generation and accumulation unit divides the 
exception word into (i) a syllable that is a basic phonetic unit constituting a pronunciation of the 
word and (ii) a unit that is obtained by combining syllables, and generates the lower-level N- 
gram language model by modeling a sequence made up of the syllable and the unit obtained by 
combining syllables in dependency on a position of the exception word in the syntactic tree and 
on the linguistic property of the exception word. 

1 1 . (Previously Presented) The language model generation and accumulation 
apparatus according to Claim 1 , 

wherein the higher-level N-gram language model generation and accumulation unit 
generates the higher-level N-gram language model in which each sequence of N words including 
the word string class is associated with a probability at which each sequence of N words occurs. 

12. (Previously Presented) The language model generation and accumulation 
apparatus according to Claim 1, 

wherein the lower-level N-gram language model generation and accumulation unit 
generates the lower-level N-gram language model by associating each of an N-long chain of 
words constituting the word string class with a probability at which each of the N-long chain of 
words occurs. 

1 3 . (Currently Amended) A speech recognition apparatus that recognizes a speech 
which is a sequence of uttered words, using the following: 

a higher-level N-gram language model that is obtained by modeling each of a plurality of 
texts as a sequence of words that includes a word string class indicating a linguistic property of a 
word string constituting two more words; and 

a lower-level N-gram language model that is obtained by modeling a festsequence of two or 
more words within the word string class having a specific linguistic property; and 

a higher level N gram language model that is obtained by modeling the first sequence of 
words modeled in the lower level N gram language model as a word string class and a plurality of 
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text as a second sequence of words that includes the word string class . 

14. (Currently Amended) A speech recognition apparatus that recognizes a sequence 
of uttered words, comprising: 

a higher- level N-gram language model generation and accumulation unit operable to generate 
and accumulate a higher-lever N-gram language model that is obtained by modeling each of a 
plurality of texts as a sequence of words that includes a word string class indicating a linguistic 
property of a word string constituting two or more words; and 

a lower- level N-gram language model generation and accumulation unit operable to generate 
and accumulate a lower-level N-gram language model that is obtained by modeling a f4rstsequence 
of two or more words within the word string class having a specific linguistic property 

a higher level N gram language model generation and accumulation unit operable to generate 
and accumulat e a higher l e v e r N gram languag e mod e l that is obtain e d by mod e ling th e first 
sequence of words modeled in the lower lovol N gram language model as a word string class and a 
plurality of t e xt as a s e cond s e qu e nce of words that includ e s th e word string class , and 

the speech recognition apparatus recognizes the speech by use of the higher-level N-gram 
language model that is accumulated by the higher-level N-gram language model generation and 
accumulation unit and the lower-level N-gram language model that is accumulated by the lower- 
level N-gram language model generation and accumulation unit. 

15. (Previously Presented) The speech recognition apparatus according to Claim 14, 
wherein the higher-level N-gram language model generation and accumulation unit and 

the lower-level N-gram language model generation and accumulation unit generate the 
respective language models, using different corpuses, and 

the speech recognition apparatus recognizes speech by use of the higher-level N-gram 
language model and the lower-level N-gram language model respectively constructed using the 
different corpuses. 
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16. (Previously Presented) The speech recognition apparatus according to Claim 15, 
wherein the lower-level N-gram language model generation and accumulation unit 

includes 

a corpus update unit operable to update a corpus for the lower-level N-gram language 

model, 

the lower-level N-gram language model generation and accumulation unit updates the 
lower-level N-gram language model based on the updated corpus, and generates the updated 
lower-level N-gram language model, and 

the speech recognition apparatus recognizes the speech by use of the updated lower-level 
N-gram language model. 

17. (Previously Presented) The speech recognition apparatus according to Claim 14, 
wherein the lower-level N-gram language model generation and accumulation unit 

analyzes a sequence of words within the word string class into one or more morphemes that are 
the smallest language units having meanings, and generates the lower-level N-gram language 
model by modeling each sequence of the one or more morphemes based on the word string class, 
and 

the speech recognition apparatus recognizes the speech by use of the lowcr-lcvcl N-gram 
language model that has been modeled as the sequence of the one or more morphemes. 

18. (Previously Presented) The speech recognition apparatus according to Claim 14, 
wherein the higher-level N-gram language model generation and accumulation unit 

substitutes the word string class with a virtual word, and then generates the higher-level N-gram 
language model by modeling a sequence made up of the virtual word and other words, the word 
string class being included in each of the plurality of texts analyzed into morphemes, and 

the speech recognition apparatus recognizes the speech by use of the higher-level N-gram 
language model that has been modeled as the sequence made up of the virtual word and other 
words. 
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19. (Previously Presented) The speech recognition apparatus according to Claim 18, 
wherein the lower-level N-gram language model generation and accumulation unit 

includes 

an exception word judgment unit operable to judge whether or not a specific word out of 
a plurality of words that appear in the word string class should be treated as an exception word, 
based on a linguistic property of the specific word, and divides the exception word into (i) a 
syllable that is a basic phonetic unit constituting a pronunciation of the exception word and (ii) a 
unit that is obtained by combining syllables based on a result of the judgment, the exception 
word being a word not being included as a constituent word of the word string class, 

the language model generation and accumulation apparatus further comprises 

a class dependent syllable N-gram generation and accumulation unit operable to generate 
class dependent syllable N-grams by modeling a sequence made up of the syllabic and the unit 
obtained by combining syllables and by providing a language likelihood to the sequence in 
dependency on either the word string class or the linguistic property of the exception word, and 
accumulate the generated class dependent syllable N-grams, the language likelihood being a 
logarithm value of a probability, and 

the speech recognition apparatus recognizes the speech by use of the class dependent 
syllable N-grams. 

20. (Previously Presented) The speech recognition apparatus according to Claim 19, 

wherein the language model generation and accumulation apparatus further comprises 
a syntactic tree generation unit operable to perform morphemic analysis as well as 

syntactic analysis of a text, and generate a syntactic tree in which the text is structured by a 

plurality of layers, focusing on a node that is on the syntactic tree and that has been selected on 

the basis of a predetermined criterion, 

wherein the higher-level N-gram language model generation and accumulation unit 

generates the higher-level N-gram language model for syntactic tree, using a first subtree that 

constitutes an upper layer from the focused node, and 

the lower-level N-gram language model generation and accumulation unit generates the 
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lower-level N-gram language model for syntactic tree, using a second subtree that constitutes a 
lower layer from the focused node, and 

the speech recognition apparatus comprises: 

an acoustic processing unit operable to generate feature parameters from the speech; 

a word comparison unit operable to compare a pronunciation of each word with each of 
the feature parameters, and generate a set of word hypotheses including an utterance segment of 
each word and an acoustic likelihood of each word; and 

a word string hypothesis generation unit operable to generate a word string hypothesis 
from the set of word hypotheses with reference to the higher-level N-gram language model for 
syntactic tree and the lower-level N-gram language model for syntactic tree, and generate a result 
of the speech recognition. 

21 . (Previously Presented) The speech recognition apparatus according to Claim 20, 
wherein the lower-level N-gram language model generation and accumulation unit 

includes 

a language model generation exception word judgment unit operable to judge a specific 
word appearing in the second subtree as an exception word based on a predetermined linguistic 
property, the exception word being a word not being included as a constituent word of any 
subtree , 

the lower-level N-gram language model generation and accumulation unit generates the 
lower-level N-gram language model by dividing the exception word into (i) a syllable that is a 
basic phonetic unit constituting a pronunciation of the word and (ii) a unit that is obtained by 
combining syllables, and then by modeling a sequence made up of the syllable and the unit 
obtained by combining syllables in dependency on a location of the exception word in the 
syntactic tree and on the linguistic property of the exception word, and 

the word string hypothesis generation unit generates the result of the speech recognition. 

22. (Previously Presented) The speech recognition apparatus according to Claim 14, 

wherein the language model generation and accumulation apparatus further comprises 
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a syntactic tree generation unit operable to perform morphemic analysis as well as 
syntactic analysis of a text, and generate a syntactic tree in which the text is structured by a 
plurality of layers, focusing on a node that is on the syntactic tree and that has been selected on 
the basis of a predetermined criterion, 

wherein the higher-level N-gram language model generation and accumulation unit 
generates the higher-level N-gram language model, using a first subtree that constitutes a highest 
layer of the syntactic tree, 

the lower-level N-gram language model generation and accumulation unit categorizes 
each subtree constituting a layer lower than a second layer based on a positioning of the each 
subtree when included in the upper layer and generates the lower-level N-gram language model 
by use of each of the categorized subtree, and 

the speech recognition apparatus recognizes the speech by use of the higher-level N-gram 
language model that has been generated using the first subtree and the lower-level N-gram 
language model that has been generated using each subtree constituting a layer lower than the 
second layer. 

23. (Previously Presented) The speech recognition apparatus according to Claim 22, 

wherein the lower-level N-gram language model generation and accumulation unit 
includes 

a language model generation exception word judgment unit operable to judge, as an 
exception word, a specific word appearing in any subtree in a layer lower than the second layer 
based on a predetermined linguistic property, the exception word being a word not being 
included as a constituent word of any subtree, 

the lower-level N-gram language model generation and accumulation unit divides the 
exception word into (i) a syllable that is a basic phonetic unit constituting a pronunciation of the 
word and (ii) a unit that is obtained by combining syllables, and generates the lower-level N- 
gram language model by modeling a sequence made up of the syllable and the unit obtained by 
combining syllables in dependency on a position of the exception word in the syntactic tree and 
on the linguistic property of the exception word, and 
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the speech recognition apparatus recognizes the speech by use of the higher-level N-gram 
language model that does not include the exception word and the lower-level N-gram language 
model that includes the exception word. 

24. (Previously Presented) The speech recognition apparatus according to Claim 14, 

wherein the higher-level N-gram language model generation and accumulation unit 
generates the higher-level N-gram language model in which each sequence of N words including 
the word string class is associated with a probability at which the each sequence of words occurs, 
and 

the speech recognition apparatus comprises 
a word string hypothesis generation unit operable to evaluate a word string hypothesis 
by multiplying each probability at which the each sequence of N words including the word string 
class occurs. 

25. (Previously Presented) The speech recognition apparatus according to Claim 14, 

wherein the lower-level N-gram language model generation and accumulation unit 
generates the lower-level N-gram language model by associating each N-long chain of words 
constituting the word string class with a probability at which the each chain of words occurs, and 

the speech recognition apparatus comprises 

a word string hypothesis generation unit operable to evaluate a word string hypothesis 
by multiplying each probability at which the each sequence of N words inside the word string 
class occurs. 

26. (Currently Amended) A language model generation method for generating 
language models for speech recognition, comprising: 

a higher-level N-gram language model generation and accumulation step for generating 
and accumulating a higher-lever N-gram language model that is obtained by modeling each of a 
plurality of texts as a sequence of words that includes a word string class indicating a linguistic 
property of a word string constituting two or more words; and 
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a lower-level N-gram language model generation and accumulation step for generating and 
accumulating a lower- level N-gram language model that is obtained by modeling a-fest sequence of 
two or more words within the word string class having a specific linguistic property; and 

a higher - level N - gram language model generation and accumulation step for generating and 
accumulating a higher lever N gram language model that is obtained by modeling the first sequence 
of words modeled in the lower - level N - gram language model as a word string class and a plurality of 
text as a second sequence of words that includes the word string class . 

27. (Currently Amended) A speech recognition method for recognizing a speech 
which is a sequence of uttered words, using the following: 

a higher-level N-gram language model that is obtained by modeling each of a plurality of 
texts as a sequence of words that includes a word string class indicating a linguistic property of a 
word string constituting two or more words; and 

a lower-level N-gram language model that is obtained by modeling a fetsequence of two or 
more words within the word string class having a specific linguistic property; and 

a high e r l e v e l N gram language model that is obtained by modeling th e first s e qu e nc e of 
words modeled in the lower lovol N gram language model as a word string class and a plurality of 
text as a second sequence of words that includes the word string class . 

28. (Previously Presented) The speech recognition method according to Claim 27, 
further comprising 

a step of categorizing each word string having a specific linguistic property as a word 
string class, and providing, to the each word string, a language likelihood which is a logarithm 
value of a probability, by use of class dependent word N-grams that are obtained by modeling the 
word string class in dependency on the word string class based on a linguistic relationship 
between words constituting the word string class; 

a step of analyzing a text into a word and the word string class, and providing, to a 
sequence of the word and the word string class, a language likelihood which is alogarithm value 
of a probability, by use of class N-grams that are obtained by modeling the sequence of the word 
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and the word string class based on a linguistic relationship; and 

a step of (i) comparing features parameters extracted from a series of speeches with a 
pronunciation as well as an acoustic characteristic of each word and generating a set of word 
hypotheses including an utterance segment of the each word and an acoustic likelihood of the 
each word, (ii) generating a word string hypothesis from the set of word string hypotheses with 
reference to the class N-grams and the class dependent word N-grams, and (iii) outputting a 
result of the speech recognition. 

29. (Currently Amended) A program for performing a language model generation 
method that is intended for generating a language model for speech recognition, the program 
causing a computer to execute the following steps: 

a higher-level N-gram language model generation and accumulation step for generating 
and accumulating a higher-lever N-gram language model that is obtained by modeling each of a 
plurality of texts as a sequence of words that includes a word string class indicating a linguistic 
property of a word string constituting two or more words; and 

a lower-level N-gram language model generation and accumulation step for generating and 
accumulating a lower-level N-gram language model that is obtained by modeling a -fcsKscquence of 
two or more words within the word string class having a specific linguistic property 

a higher level N gram language model generation and accumulation step for generating and 
accumulating a higher lovor N gram language model that is obtained by modeling the first sequence 
of words modeled in the lower level N gram language model as a word string class and a plurality of 
text as a second sequence of words that includes the word string class . 

30. (Currently Amended) A program for performing a speech recognition method that 
is intended for recognizing a sequence of uttered words, the program causing a computer to 
execute a speech recognition step that is performed by use of the following: 

a higher-level N-gram language model that is obtained by modeling each of a plurality of 
texts as a sequence of words that includes a word string class indicating a linguistic property of a 
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word string constituting two or more words; and 

a lower-level N-gram language model that is obtained by modeling a festsequence of two or 
more words within the word string class having a specific linguistic property; and 

a higher - level N - gram language model that is obtained by modeling the first sequence of 
words modeled in the lower level N gram language model as a word string class and a plurality of 
text as a second sequence of words that includes the word string class . 

3 1 . (New) The language model generation and accumulation apparatus according to 

claim 1, 

wherein the lower-level N-gram language model generation and accumulation unit is 
operable to represent a first sequence of words having a common linguistic property as the word 
string class, to generate and to accumulate, for each word string class, the lower-level N-gram 
language model that is obtained by modeling the first sequence of words included in the word 
string class; and 

the higher-level N-gram language model generation and accumulation unit is operable to 
replace the first sequence of words modeled in the lower-level N-grams language model included in 
a text which is the sequence of words with a word string class corresponding to the first sequence of 
word, and to generate and to accumulate a higher-lever N-gram language model that is obtained by 
modeling the text which is the character string as a sequence of words that includes the word string 
class and a second sequence of words, 

each word included in the first sequence of words and each word included in the second 
sequence of words are respectively morphemes which are smallest linguistic units that have 
meaning, and 

the lower-level N-gram language model generation and accumulation unit is operable to 
generate and accumulate, for each word string class, the first sequence of words having the linguistic 
property indicated by the word string class. 
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